Using Decision List for Farsi Word Sense Disambiguation

نویسندگان

  • Mohammad Mehdi Homayounpour
  • Raheleh Makki
چکیده

This paper describes Farsi word sense disambiguation in unrestricted text using decision list. Decision list is a rule based algorithm which searches for discriminatory features in the training data and extracts a set of rules. These rules are used for disambiguation of word senses. Since this method is a supervised corpus based method, it needs a Farsi sense-tagged corpus. In this paper, we used a raw corpus and labeled a subset of it manually. To evaluate the performance of this method, we applied it to 20 Farsi homographs. The comparison of disambiguation results with baselines shows the effectiveness of this method. Moreover, this method was compared to K Nearest Neighbor (KNN) which is an exemplar based method. In this paper, we used 10 fold cross validation test method in evaluations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

رفع ابهام معنایی واژگان مبهم فارسی با مدل موضوعی LDA

Word sense disambiguation is the task of identifying the correct sense for the word in a given context among a finite set of possible sense. In this paper a model for farsi word sense disambiguation is presented. The model use two group of features: first, all word and stop words around target word and topic models as second features. We extract topics from a farsi corpus with Latent Dirichlet ...

متن کامل

Standard Test Collection for English-Persian Cross-Lingual Word Sense Disambiguation

In this paper, we address the shortage of evaluation benchmarks on Persian (Farsi) language by creating and making available a new benchmark for English to Persian Cross Lingual Word Sense Disambiguation (CL-WSD). In creating the benchmark, we follow the format of the SemEval 2013 CL-WSD task, such that the introduced tools of the task can also be applied on the benchmark. In fact, the new benc...

متن کامل

Improving the Collocation Extraction Method Using an Untagged Corpus for Persian Word Sense Disambiguation

Word sense disambiguation is used in many natural language processing fields. One of the ways of disambiguation is the use of decision list algorithm which is a supervised method. Supervised methods are considered as the most accurate machine learning algorithms but they are strongly influenced by knowledge acquisition bottleneck which means that their efficiency depends on the size of the tagg...

متن کامل

Detection of Japanese Homophone Errors by a Decision List Including a Written Word as a Default Evidence

In this paper, we propose a practical method to detect Japanese homophone errors in Japanese texts. It is very important to detect homophone errors in Japanese revision systems because Japanese texts suffer from homophone errors frequently. In order to detect homophone errors, we have only to solve the homophone problem. We can use the decision list to do it because the homophone problem is equ...

متن کامل

Influence of Morphology in Word Sense Disambiguation for Tamil

Many Word Sense Disambiguation (WSD) algorithms do not take into account the morphological variations in the language. However, as Indian languages are highly inflected languages, we investigate whether morphology must be taken into account for WSD for Indian languages, as they are very rich in morphology. This paper analyses the influence of morphology in WSD for Tamil. We believe our results ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008